Software Defect Prediction Based on GUHA Data Mining Procedure and Multi-Objective Pareto Efficient Rule Selection

نویسندگان

  • Bharavi Mishra
  • Kaushal K. Shukla
چکیده

Software defect prediction, if is effective, enables the developers to distribute their testing efforts efficiently and let them focus on defect prone modules. It would be very resource consuming to test all the modules while the defect lies in fraction of modules. Information about fault-proneness of classes and methods can be used to develop new strategies which can help mitigate the overall development cost and increase the customer satisfaction. Several machine learning strategies have been used in recent past to identify defective modules. These models are built using publicly available historical software defect data sets. Most of the proposed techniques are not able to deal with the class imbalance problem efficiently. Therefore, it is necessary to develop a prediction model which consists of small simple and comprehensible rules. Considering these facts, in this paper, the authors propose a novel defect prediction approach named GUHA based Classification Association Rule Mining algorithm (G-CARM) where “GUHA” stands for General Unary Hypothesis Automaton. G-CARM approach is primarily based on Classification Association Rule Mining, and deploys a two stage process involving attribute discretization, and rule generation using GUHA. GUHA is oldest yet very powerful method of pattern mining. The basic idea of GUHA procedure is to mine the interesting attribute patterns that indicate defect proneness. The new method has been compared against five other models reported in recent literature viz. Naive Bayes, Support Vector Machine, RIPPER, J48 and Nearest Neighbour classifier by using several measures, including AUC and probability of detection. The experimental results indicate that the prediction performance of G-CARM approach is better than other prediction approaches. The authors’ approach achieved 76% mean recall and 83% mean precision for defective modules and 93% mean recall and 83% mean precision for non-defective modules on CM1, KC1, KC2 and Eclipse data sets. Further defect rule generation process often generates a large number of rules which require considerable efforts while using these rules as a defect predictor, hence, a rule sub-set selection process is also proposed to select best set of rules according to the requirements. Evolution criteria for defect prediction like sensitivity, specificity, precision often compete against each other. It is therefore, important to use multi-objective optimization algorithms for selecting prediction rules. In this paper the authors report prediction rules that are Pareto efficient in the sense that no further improvements in the rule set is possible without sacrificing some performance criteria. Non-Dominated Sorting Genetic Algorithm has been used to find Pareto front and defect prediction rules. Software Defect Prediction Based on GUHA Data Mining Procedure and Multi-Objective Pareto Efficient Rule Selection

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Attributes Patterns of Defective Modules for Object Oriented Software

Defect prediction is the process of predicting the fault prone module using some pre-mined patterns or rules. Several statistical and mathematical strategies have been used in recent past to mine these rules. However, the interpretability of these rules is the matter of concern. In real development process an expert is required to demonstrate the working of mined patterns which prevents the use...

متن کامل

A Novel Method for Selecting the Supplier Based on Association Rule Mining

One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...

متن کامل

PSO for multi-objective problems: Criteria for leader selection and uniformity distribution

This paper proposes a method to solve multi-objective problems using improved Particle Swarm Optimization. We propose leader particles which guide other particles inside the problem domain. Two techniques are suggested for selection and deletion of such particles to improve the optimal solutions. The first one is based on the mean of the m optimal particles and the second one is based on appoin...

متن کامل

Multiobjective Classification Rule Mining

In this chapter, we discuss the application of evolutionary multiobjective optimization (EMO) to association rule mining. Especially, we focus our attention on classification rule mining in a continuous feature space where the antecedent and consequent parts of each rule are an interval vector and a class label, respectively. First we explain evolutionary multiobjective classification rule mini...

متن کامل

A role for Pareto optimality in mining performance data

Improvements in performance modeling and identification of computational regimes within software libraries is a critical first step in developing software libraries that are truly agile with respect to the application as well as to the hardware. It is shown here that Pareto ranking, a concept from multi-objective optimization, can be an effective tool for mining large performance datasets. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJSSCI

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2014